Regret for Expected Improvement over the Best-Observed Value and Stopping Condition

نویسندگان

Vu Nguyen

Sunil Gupta

Santu Rana

Cheng Li

Svetha Venkatesh

چکیده

Bayesian optimization (BO) is a sample-efficient method for global optimization of expensive, noisy, black-box functions using probabilistic methods. The performance of a BO method depends on its selection strategy through the acquisition function. Expected improvement (EI) is one of the most widely used acquisition functions for BO that finds the expectation of the improvement function over the incumbent. The incumbent is usually selected as the best-observed value so far, termed as ymax (for the maximizing problem). Recent work has studied the convergence rate for EI under some mild assumptions or zero noise of observations. Especially, the work of Wang and de Freitas (2014) has derived the sublinear regret for EI under a stochastic noise. However, due to the difficulty in stochastic noise setting and to make the convergent proof feasible, they use an alternative choice for the incumbent as the maximum of the Gaussian process predictive mean, μmax. This modification makes the algorithm computationally inefficient because it requires an additional global optimization step to estimate μmax that is costly and may be inaccurate. To address this issue, we derive a sublinear convergence rate for EI using the commonly used ymax. Moreover, our analysis is the first to study a stopping criteria for EI to prevent unnecessary evaluations. Our analysis complements the results of Wang and de Freitas (2014) to theoretically cover two incumbent settings for EI. Finally, we demonstrate empirically that EI using ymax is both more computationally efficiency and more accurate than EI using μmax.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Regret Minimization Approach in Product Portfolio Management with respect to Customers’ Price-sensitivity

In an uncertain and competitive environment, product portfolio management (PPM) becomes more challenging for manufacturers to decide what to make and establish the most beneficial product portfolio. In this paper, a novel approach in PPM is proposed in which the environment uncertainty, competitors’ behavior and customer’s satisfaction are simultaneously considered as the most important criteri...

متن کامل

Optimal Stopping Policy for Multivariate Sequences a Generalized Best Choice Problem

In the classical versions of “Best Choice Problem”, the sequence of offers is a random sample from a single known distribution. We present an extension of this problem in which the sequential offers are random variables but from multiple independent distributions. Each distribution function represents a class of investment or offers. Offers appear without any specified order. The objective is...

متن کامل

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

We consider the problem of learning an unknown Markov Decision Process (MDP) that is weakly communicating in the infinite horizon setting. We propose a Thompson Sampling-based reinforcement learning algorithm with dynamic episodes (TSDE). At the beginning of each episode, the algorithm generates a sample from the posterior distribution over the unknown model parameters. It then follows the opti...

متن کامل

Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost distributions: the classic K-armed bandit and the linearly parameterized bandit. In both settings, we propose algorithms that are inspired by Upper Confidence B...

متن کامل

Robust Solutions for the DWDM Routing and Provisioning Problem: Models and Algorithms

The dense wavelength division multiplexing routing and provisioning problem with uncertain demands and a fixed budget is modeled as a multicriteria optimization problem. To obtain a robust design for this problem, the primary objective is to minimize a regret function that models the total amount of over and/or under provisioning in the network resulting from uncertainty in a demand forecast. P...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Regret for Expected Improvement over the Best-Observed Value and Stopping Condition

نویسندگان

چکیده

منابع مشابه

A Regret Minimization Approach in Product Portfolio Management with respect to Customers’ Price-sensitivity

Optimal Stopping Policy for Multivariate Sequences a Generalized Best Choice Problem

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected

Robust Solutions for the DWDM Routing and Provisioning Problem: Models and Algorithms

عنوان ژورنال:

اشتراک گذاری